COVSMA stands for Copernicus Satellites Versus Maladies: The current sanitary crisis implied the necessity to develop an online tool to monitor pollution levels, display alerts that will imply that governments automatically take measures: days without cars and trucks that are not 100% Electrical, monitor the live impact of the measures taken, and forecast COVID19 risk for up to 4 days in the future, COVID19 risk being defined as the predicted numbers of new hospitalisations due to severe COVID19 cases for all states/departements. We have baptised this tool with the analog name of COVSCO (Copernicus Satellites Versus COVID19). We start with France and its 96 departements. A follow up will be to apply the same methodology to severe respiratory diseases and to expand the model and databases to a worldwide scale.
nom numero time hospi reanim newhospi newreanim \
34 Ain 1.0 2020-05-14 137.0 8.0 4.0 0.0
35 Ain 1.0 2020-05-15 135.0 7.0 4.0 0.0
36 Ain 1.0 2020-05-16 134.0 6.0 1.0 0.0
37 Ain 1.0 2020-05-17 133.0 6.0 1.0 0.0
38 Ain 1.0 2020-05-18 132.0 6.0 1.0 0.0
... ... ... ... ... ... ... ...
34171 Val-d'Oise 95.0 2021-03-27 659.0 79.0 43.0 7.0
34172 Val-d'Oise 95.0 2021-03-28 667.0 81.0 38.0 7.0
34173 Val-d'Oise 95.0 2021-03-29 680.0 76.0 44.0 4.0
34174 Val-d'Oise 95.0 2021-03-30 688.0 75.0 88.0 5.0
34175 Val-d'Oise 95.0 2021-03-31 698.0 77.0 83.0 7.0
deces gueris dep_num ... depnum_y Smokers Nb_susp_501Y_V1 \
34 88.0 318.0 1.0 ... 1 0.262 0
35 89.0 323.0 1.0 ... 1 0.262 0
36 90.0 325.0 1.0 ... 1 0.262 0
37 90.0 326.0 1.0 ... 1 0.262 0
38 90.0 331.0 1.0 ... 1 0.262 0
... ... ... ... ... ... ... ...
34171 1603.0 6857.0 95.0 ... 95 0.213 9561
34172 1606.0 6882.0 95.0 ... 95 0.213 9557
34173 1615.0 6902.0 95.0 ... 95 0.213 9496
34174 1632.0 6964.0 95.0 ... 95 0.213 8643
34175 1639.0 7024.0 95.0 ... 95 0.213 7915
Nb_susp_501Y_V2_3 minority pauvrete rsa ouvriers \
34 0 54821.0 10.7 2.3 17.74
35 0 54821.0 10.7 2.3 17.74
36 0 54821.0 10.7 2.3 17.74
37 0 54821.0 10.7 2.3 17.74
38 0 54821.0 10.7 2.3 17.74
... ... ... ... ... ...
34171 432 161947.0 16.8 5.8 17.63
34172 428 161947.0 16.8 5.8 17.63
34173 396 161947.0 16.8 5.8 17.63
34174 364 161947.0 16.8 5.8 17.63
34175 332 161947.0 16.8 5.8 17.63
totalcovidcasescumulated prevdaytotalcovidcasescumulated
34 36.0 18
35 46.0 36
36 46.0 46
37 46.0 46
38 68.0 46
... ... ...
34171 225860.0 223604
34172 226188.0 225860
34173 230717.0 226188
34174 234429.0 230717
34175 237793.0 234429
[30912 rows x 93 columns]
| idx | pm25 | no2 | o3 | pm10 | co | pm257davg | no27davg | o37davg | co7davg | ... | normno27davg | normo37davg | normpm107davg | normco7davg | normpm251Mavg | normno21Mavg | normo31Mavg | normpm101Mavg | normco1Mavg | newhospi | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34 | 631877.0 | 6.403166 | 3.939003 | 43.262828 | 7.967417 | 177.622754 | 7.257504 | 3.476745 | 72.813923 | 161.817954 | ... | 0.046602 | 0.586532 | 0.129071 | 0.234867 | 0.108579 | 0.048632 | 0.559299 | 0.094840 | 0.218359 | 4.0 |
| 35 | 631877.0 | 10.041256 | 4.039703 | 45.365958 | 12.985050 | 177.204810 | 7.283296 | 3.523846 | 71.713745 | 162.361518 | ... | 0.047305 | 0.577448 | 0.127923 | 0.236187 | 0.119927 | 0.051150 | 0.496692 | 0.102941 | 0.226082 | 4.0 |
| 36 | 631877.0 | 8.650893 | 2.993409 | 64.447998 | 10.213892 | 173.986833 | 7.222348 | 3.502823 | 70.928693 | 162.525890 | ... | 0.046991 | 0.570966 | 0.125429 | 0.236587 | 0.129652 | 0.051861 | 0.455272 | 0.105411 | 0.234060 | 1.0 |
| 37 | 631877.0 | 7.924968 | 2.470320 | 81.736362 | 11.228378 | 163.052671 | 7.159819 | 3.487527 | 70.520307 | 162.329540 | ... | 0.046763 | 0.567594 | 0.123235 | 0.236110 | 0.127766 | 0.051226 | 0.458184 | 0.101608 | 0.239713 | 1.0 |
| 38 | 631877.0 | 8.803713 | 2.883282 | 79.918855 | 11.338718 | 186.330957 | 7.143561 | 3.477693 | 69.999498 | 162.848931 | ... | 0.046616 | 0.563293 | 0.120713 | 0.237371 | 0.142927 | 0.049201 | 0.477197 | 0.113024 | 0.255846 | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 34171 | 1215390.0 | 6.100140 | 6.627001 | 70.229467 | 11.613934 | 185.482428 | 14.580851 | 14.493739 | 56.915039 | 207.123258 | ... | 0.211142 | 0.455257 | 0.281847 | 0.344925 | 0.302061 | 0.237179 | 0.471712 | 0.271828 | 0.352307 | 43.0 |
| 34172 | 1215390.0 | 8.425896 | 9.099569 | 68.209629 | 13.743489 | 193.139682 | 14.195306 | 14.223872 | 57.801110 | 206.960739 | ... | 0.207112 | 0.462573 | 0.269762 | 0.344530 | 0.289930 | 0.240049 | 0.480526 | 0.265162 | 0.358714 | 38.0 |
| 34173 | 1215390.0 | 13.033086 | 29.526526 | 54.441418 | 18.517216 | 237.388093 | 14.249303 | 14.775230 | 57.704602 | 207.544318 | ... | 0.215346 | 0.461776 | 0.270202 | 0.345948 | 0.295797 | 0.280570 | 0.463981 | 0.266083 | 0.371198 | 44.0 |
| 34174 | 1215390.0 | 21.792762 | 51.437356 | 40.131381 | 30.011262 | 307.382279 | 14.525749 | 16.057324 | 57.291931 | 210.476992 | ... | 0.234495 | 0.458369 | 0.274637 | 0.353072 | 0.286940 | 0.319473 | 0.461757 | 0.261254 | 0.379313 | 88.0 |
| 34175 | 1215390.0 | 21.862512 | 49.729555 | 42.796275 | 32.744934 | 290.497534 | 14.654314 | 17.360417 | 57.019812 | 213.319222 | ... | 0.253957 | 0.456122 | 0.278918 | 0.359977 | 0.278900 | 0.366677 | 0.452311 | 0.260245 | 0.393208 | 83.0 |
30912 rows × 56 columns
Index(['idx', 'pm25', 'pm257davg', 'normpm25', 'hospiprevday',
'covidpostestprevday', 'prevdaytotalcovidcasescumulated',
'all_day_bing_tiles_visited_relative_change',
'all_day_ratio_single_tile_users', 'vac1nb', 'vac2nb',
'Insuffisance respiratoire chronique grave (ALD14)',
'Insuffisance cardiaque grave, troubles du rythme graves, cardiopathies valvulaires graves, cardiopathies congénitales graves (ALD5)',
'Smokers', 'minority', 'Nb_susp_501Y_V1', 'Nb_susp_501Y_V2_3',
'1MMaxpm25', 'pm251Mavg', 'pauvrete', 'rsa', 'ouvriers'],
dtype='object')
22
The daily number of new hospitalizations due to severe COVID19 cases for every French departement is what we will predict.
Text(0, 0.5, 'newhospimean')
Text(0, 0.5, 'newhospimean')
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
[95.]
Département Numéro Date of pollution peak 1MMaxo3 \
0 Val-d'Oise 95.0 2020-08-09 122.889769
1 Alpes-de-Haute-Provence 4.0 2020-08-08 119.833327
2 Haut-Rhin 68.0 2020-07-31 116.979563
3 Moselle 57.0 2020-08-10 115.873423
4 Ain 1.0 2020-09-18 114.506434
.. ... ... ... ...
90 Pyrénées-Atlantiques 64.0 2020-08-06 94.093549
91 Gard 30.0 2020-05-20 93.974202
92 Cher 18.0 2020-07-13 93.921768
93 Hautes-Pyrénées 65.0 2020-05-30 93.086194
94 Ariège 9.0 2020-05-29 90.297432
totalcovidcasescumulated Population Index
0 237793 1215390.0
1 20858 161799.0
2 79025 762607.0
3 141705 1044486.0
4 109223 631877.0
.. ... ...
90 62382 670032.0
91 107351 738189.0
92 32904 308992.0
93 25307 228582.0
94 14118 152499.0
[95 rows x 6 columns]
[59.]
Département Numéro Date of pollution peak 1MMaxpm25 \
0 Nord 59.0 2020-11-27 39.932960
1 Haut-Rhin 68.0 2021-02-24 37.243984
2 Deux-Sèvres 79.0 2021-03-09 36.380767
3 Paris 75.0 2021-01-02 35.418335
4 Vienne 86.0 2021-03-09 34.895373
.. ... ... ... ...
90 Alpes-Maritimes 6.0 2021-03-06 20.296409
91 Côte-d'Or 21.0 2021-02-23 20.115372
92 Lozère 48.0 2021-03-04 19.935369
93 Alpes-de-Haute-Provence 4.0 2021-02-23 19.765781
94 Ardèche 7.0 2021-03-04 19.407017
totalcovidcasescumulated Population Index
0 477936 2605238.0
1 79025 762607.0
2 35767 374435.0
3 404122 2206488.0
4 38030 434887.0
.. ... ...
90 219293 1082440.0
91 66868 533147.0
92 10367 76309.0
93 20858 161799.0
94 45658 324209.0
[95 rows x 6 columns]
[75.]
Département Numéro Date of pollution peak 1MMaxno2 \
0 Paris 75.0 2021-03-31 67.312539
1 Hauts-de-Seine 92.0 2021-03-02 64.475306
2 Val-de-Marne 94.0 2021-01-08 53.384538
3 Val-d'Oise 95.0 2021-03-02 51.599818
4 Yvelines 78.0 2020-11-26 43.024886
.. ... ... ... ...
90 Lozère 48.0 2021-01-07 7.727260
91 Corse-du-Sud 201.0 2020-12-09 6.285247
92 Ariège 9.0 2021-01-09 6.254226
93 Pyrénées-Orientales 66.0 2021-01-09 6.071631
94 Haute-Corse 202.0 2021-01-09 5.592579
totalcovidcasescumulated Population Index
0 404122 2206488.0
1 266670 1601569.0
2 265897 1372389.0
3 237793 1215390.0
4 212766 1427291.0
.. ... ...
90 10367 76309.0
91 11183 152730.0
92 14118 152499.0
93 45028 471038.0
94 13884 174553.0
[95 rows x 6 columns]
[92.]
Département Numéro Date of pollution peak 1MMaxco \
0 Hauts-de-Seine 92.0 2020-11-26 476.783872
1 Bas-Rhin 67.0 2020-11-10 442.772829
2 Paris 75.0 2020-11-26 442.031472
3 Val-de-Marne 94.0 2021-01-02 400.354289
4 Bouches-du-Rhône 13.0 2021-02-24 364.207868
.. ... ... ... ...
90 Cantal 15.0 2021-01-06 204.699850
91 Lozère 48.0 2021-01-07 202.482931
92 Hautes-Pyrénées 65.0 2021-01-10 200.001928
93 Ariège 9.0 2021-01-11 189.451920
94 Pyrénées-Orientales 66.0 2021-03-06 182.390438
totalcovidcasescumulated Population Index
0 266670 1601569.0
1 138675 1116658.0
2 404122 2206488.0
3 265897 1372389.0
4 404460 2016622.0
.. ... ...
90 11762 146219.0
91 10367 76309.0
92 25307 228582.0
93 14118 152499.0
94 45028 471038.0
[95 rows x 6 columns]
[67.]
Département Numéro Date of pollution peak 1MMaxpm10 \
0 Bas-Rhin 67.0 2021-02-25 74.188288
1 Haut-Rhin 68.0 2021-02-25 71.831104
2 Corse-du-Sud 201.0 2021-02-06 70.996064
3 Vosges 88.0 2021-02-25 70.504318
4 Haute-Saône 70.0 2021-02-25 69.817902
.. ... ... ... ...
90 Mayenne 53.0 2021-03-03 40.262764
91 Eure 27.0 2021-03-03 39.774691
92 Calvados 14.0 2021-03-02 38.762629
93 Sarthe 72.0 2021-03-03 36.942586
94 Orne 61.0 2021-03-02 36.142811
totalcovidcasescumulated Population Index
0 138675 1116658.0
1 79025 762607.0
2 11183 152730.0
3 41055 372016.0
4 28170 237706.0
.. ... ...
90 29192 307940.0
91 65397 601948.0
92 61873 693579.0
93 57288 568445.0
94 29094 286618.0
[95 rows x 6 columns]
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
Gradient Boosting for regression.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.
Stack of estimators with a final regressor.
Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.
Note that estimators_ are fitted on the full X while finalestimator is trained using cross-validated predictions of the base estimators using cross_val_predict.
T-Pot exported current best pipeline
MSE:
48.927229830941876
MAE:
3.542195756660395
{'fit_time': array([21.91892815, 21.4157958 , 28.4483602 , 23.76390553, 26.08099937]), 'score_time': array([0.0273695 , 0.02692819, 0.02848053, 0.05226946, 0.02958846]), 'test_neg_mean_squared_error': array([ -88.03497565, -67.89317231, -18.78724034, -19.55104613,
-108.93610049]), 'train_neg_mean_squared_error': array([-24.80337586, -28.87515478, -33.52673893, -32.6934117 ,
-24.36486273]), 'test_neg_mean_absolute_error': array([-4.98415949, -4.4333311 , -2.49463766, -2.54796054, -6.24690705]), 'train_neg_mean_absolute_error': array([-2.5177104 , -2.62877477, -2.93896481, -2.87846415, -2.45653406])}
MSE:
-60.64050698424177
MAE
-4.1413991679799365
Scikit Learn - GradientBoostingRegressor:
index feature_importance
9 vac1nb 0.001813
16 Nb_susp_501Y_V2_3 0.001920
19 pauvrete 0.002501
13 Smokers 0.002902
10 vac2nb 0.002929
1 pm25 0.002964
20 rsa 0.003047
3 normpm25 0.003323
21 ouvriers 0.003725
11 Insuffisance respiratoire chronique grave (ALD14) 0.003786
15 Nb_susp_501Y_V1 0.004309
12 Insuffisance cardiaque grave, troubles du ryth... 0.004505
14 minority 0.004867
0 idx 0.005109
17 1MMaxpm25 0.006621
18 pm251Mavg 0.007130
2 pm257davg 0.009068
7 all_day_bing_tiles_visited_relative_change 0.011923
8 all_day_ratio_single_tile_users 0.039006
6 prevdaytotalcovidcasescumulated 0.122313
5 covidpostestprevday 0.274968
4 hospiprevday 0.481272
<Figure size 900x600 with 0 Axes>
TPOTRegressor
Version 0.11.6.post1 of tpot is outdated. Version 0.11.7 was released Wednesday January 06, 2021.
TPOT closed during evaluation in one generation. WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation. TPOT closed prematurely. Will use the current best pipeline. Best pipeline: ExtraTreesRegressor(CombineDFs(input_matrix, input_matrix), bootstrap=False, max_features=0.5, min_samples_leaf=1, min_samples_split=20, n_estimators=100) -48.56940976377202
<seaborn.axisgrid.PairGrid at 0x7fb106271e80>
Although the virus' contagious characteristic and the mobility index appear to lead in our model's feature importance report, we can notice that unusually high levels in Ozone at ground level are synchronized with the beginning of the epidemy and that the mean of all french departments' new hospitalizations due to severe COVID-19 cases is an increasing function of PM2.5-1-M Maximum and PM10 7-day average differentials.
We are building a tool, that in addition to predicting with a Mean Absolute error of 4, new departmental hospitalizations due to COVID19 severe cases (COVID Risk) for up to 4 days in the future, and to extending these predictions to hospitalizations due to respiratory diseases when COVID19 is over, ranks French departments by their pollution levels and gives alerts when PM2.5 and PM10 levels are abnormally high for all departments.
These alerts are translated into recommendations in order that the government takes measures to stop heavy traffic pollution during an amount of time determined by monitoring the live levels in pollutants.
As the relation that underlies how long the heavy traffic of cars and trucks that aren't 100% electrical has to be stopped for PM2.5 and PM10 to drop to normal levels isn't well known, the satellite data analysis tool we are building and the live graphical visualizations it offers will ensure the best monitoring of how the concentrations in Pollutants (CIPs) go down with the measures the government takes and give confirmations when the CIPs lower to a decent level.